| Supplier | Sub-Districts | 1849 Deaths per 10,000 | 1854 Deaths per 10,000 |
|---|---|---|---|
| Joint Southwark & Vauxhall/Lambeth (Treated) | 16 | 130.1 | 84.9 |
| Southwark & Vauxhall Only (Untreated) | 12 | 134.9 | 146.6 |
2025-06-10
Assistant Professor at Yale School of Public Health
Teach statistical modeling and study design
Research focus on infectious disease study design and cluster-randomized trials
Allows use of routinely-collected data
Evaluates interventions in-context
Provides “real world evidence”/population impact
Answers questions randomized trials and observational studies cannot
But … has threats to internal and external validity
8:30–9:00 Introduction to difference-in-differences
9:00–9:45 Advanced DID and staggered adoption
9:45–10:30 Analysis 1: Advanced DID of COVID-19 vaccine mandates
10:40–11:15 Introduction to synthetic control
11:15–11:45 Analysis 3: SC of Ohio’s COVID-19 vaccine lottery
11:45–12:15 Advanced SC methods
12:15–12:30 Analysis 4: Advanced SC of multiple states’ COVID-19 vaccine lotteries
Understand, interpret, and critique the use of DID and SC in epidemiology
Gain familiarity with state-of-the-art methods related to DID and SC and identify resources for further exploration
Contextualize the assumptions needed for causal inference from quasi-experiments
Implement staggered adoption DID and SC analyses and diagnostics/inference in R
I will focus here on infectious disease examples from published literature with available data. Some issues are specific to ID, while others are not, but they illustrate the points of how to approach these questions.
All materials: https://github.com/leekshaffer/Epi-QEs/
Two (or more) units: some treated/exposed, some untreated
Two (or more) time periods: some prior to first treatment, some after
Example: South London “Grand Experiment” from Coleman 2024
Untreated: Southwark & Vauxhall Districts (12)
Treated: Joint Southwark & Vauxhall/Lambeth Districts (16)
Time Periods: 1849 (pre-treatment) and 1854 (post-treatment) outbreaks
| Unit | Pre-Treatment | Post-Treatment |
|---|---|---|
| Exposed | \(Y_{10} = Y_{10}(0)\) | \(Y_{11} = Y_{11}(1)\) |
| Unexposed | \(Y_{00} = Y_{00}(0)\) | \(Y_{01} = Y_{01}(0)\) |
Treatment Effect:
\[\theta = E[Y_{11}(1) - Y_{11}(0)]\]
Within each unit, we have an interrupted time series:
\[ \begin{aligned} \Delta_1 &= Y_{11} - Y_{10} \\ \Delta_0 &= Y_{01} - Y_{00} \end{aligned} \]
Key Idea
Use the observed \(\Delta_0\) under control as the potential outcome for the unobserved \(\Delta_1\) under treatment.
\[ \begin{aligned} \hat{Y}_{11}(1) &= Y_{11} \\ \hat{Y}_{11}(0) &= Y_{10} + \color{darkgreen}{(Y_{01} - Y_{00})} \\ \hat{\theta} &= \color{purple}{(Y_{11} - Y_{10})} - \color{darkgreen}{(Y_{01} - Y_{00})} \\ \end{aligned} \]
| Supplier | Sub-Districts | 1849 Deaths per 10,000 | 1854 Deaths per 10,000 |
|---|---|---|---|
| Joint Southwark & Vauxhall/Lambeth (Treated) | 16 | 130.1 | 84.9 |
| Southwark & Vauxhall Only (Untreated) | 12 | 134.9 | 146.6 |
| Supplier | 1849 Deaths per 10,000 | 1854 Deaths per 10,000 | Diff, 1854-1849 |
|---|---|---|---|
| Joint Southwark & Vauxhall/Lambeth (Treated) | 130.1 | 84.9 | -45.2 |
| Southwark & Vauxhall Only (Untreated) | 134.9 | 146.6 | 11.8 |
| Diff, Treated-Untreated | -4.8 | -61.8 | -57.0 |
Compare to other possible estimates of \(\hat{Y}_{11}(0)\):
\(Y_{10}\): assumes no time trends
\(Y_{01}\): assumes no differences in units
Modeled trend for \(Y_{1t}\) over time: requires time model and more data
Regress \(Y_{1t}\) on \(Y_{0t}\): requires covariates, additional control units, and/or specific model
\[ Y_{it} = \alpha_i + \gamma_t + \theta I(X_{it} = 1)+\epsilon_{it}, \]
where:
\(\alpha_i\) is the fixed effect for unit \(i\),
\(\gamma_t\) is the fixed effect for time \(t\),
\(\epsilon_{it}\) is the error term for unit \(i\) in time \(t\), and
\(X_{it}\) is the indicator of whether unit \(i\) is treated at time \(t\).
Note
This is called the two-way fixed effects (TWFE) model for DID.
Inference can be conducted using the TWFE regression model. This accounts for variability in the outcome if there are multiple treated/untreated units and multiple periods.
Generally, the standard errors are clustered by unit to account for correlation. This can also be done with a block-bootstrap variance estimation.
Caution
This accounts for statistical uncertainty but not causal uncertainty in the model assumptions. Those cannot be fully assessed statistically.
Parallel trends (in expectation of potential outcomes):
\[ E[\color{purple}{Y_{11}(0) - Y_{10}(0)}] = E[\color{darkgreen}{Y_{01}(0) - Y_{00}(0)}] \]
No spillover
No anticipation/clear time point for treatment
Placebo/specification tests:
In-time: conduct the same DID analysis on a time period prior to the actual treatment initiation
In-space: conduct the same DID analysis as if an untreated unit were the treated one
Alternative outcome: conduct the same DID analysis on an outcome that should not be affected by the treatment
These approaches can be used either:
as a heuristic justification for the assumption,
to obtain a null distribution for permutation tests, or
to adjust the estimate for the “null” effect (difference-in-difference-in-differences or triple-differences).
Changing the scale of the outcome changes the parallel trends assumption. The most common transformation is to use the natural log.
E.g., \(\log(Y_{it}) = \alpha_i + \gamma_t + \theta I(X_{it}=1) + \epsilon_{it}\)
Changes parallel trends assumption to:
\[ \begin{aligned} E[\color{purple}{\log Y_{11}(0) - \log Y_{10}(0)}] &= E[\color{darkgreen}{\log Y_{01}(0) - \log Y_{00}(0)}] \\ E \left[ \log \left( \color{purple}{\frac{Y_{11}(0)}{Y_{10}(0)}} \right) \right] &= E \left[ \log \left( \color{darkgreen}{\frac{Y_{01}(0)}{Y_{00}(0)}} \right) \right] \end{aligned} \]
Caution
Only one scale can actually have parallel trends
This changes the estimand (e.g., additive -> multiplicative)
See Kahn-Lang and Lang (2020) for more considerations and Feng and Bilinski (2024) for examples of different scales/specifications.
Incorporating covariates makes the parallel trends assumption conditional on those covariates.
E.g., \(Y_{it} = \alpha_i + \gamma_t + \theta I(X_{it}=1) + \beta Z_{i} + \epsilon_{it}\)
Changes parallel trends assumption to:
\[ E[\color{purple}{Y_{11}(0) - Y_{10}(0)} ~ | ~ Z_1] = E[\color{darkgreen}{Y_{01}(0) - Y_{00}(0)} ~ | ~ Z_0] \]
Caution
This makes the parallel trends assumption more complex to consider and requires modeling covariates
This changes the estimand and assumes the effect is homogeneous across covariates
See Caetano and Callaway (2023) for issues that arise with time-varying covariates.
Estimand Interpretation
DID estimates the Average Treatment Effect on the Treated (ATT).
This may not be generalizable to other units, including the untreated units in the study.
Internal validity may be high if the assumptions are justified.
External validity may be low because of limited transportability of the ATT and limited information on effect heterogeneity.
Incorporating additional units/periods can reduce variance, but may also risk violating the assumptions
Generally conducted with limited, carefully-selected units: low bias but high variance
Examples
More distant vs. closer untreated units
Incorporating more untreated units
Incorporating more recent time periods
Advantages:
Simple to implement
Uses summary data
No need to model time trends or collect covariates
Straightforward interpretation
Disadvantages/Limitations:
Targets ATT not ATE
Need to justify key assumptions
Requires careful selection of controls
Limited inference with few units/periods